Transfer-Entropy-Regularized Markov Decision Processes
نویسندگان
چکیده
We consider the framework of transfer-entropy-regularized Markov decision process (TERMDP) in which weighted sum classical state-dependent cost and transfer entropy from state random to control input is minimized. Although TERMDPs are generally formulated as nonconvex optimization problems, an analytical necessary optimality condition can be expressed a finite set nonlinear equations, based on iterative forward–backward computational procedure similar Arimoto–Blahut algorithm developed. It shown that every limit point sequence generated by proposed stationary TERMDP. Applications discussed context networked systems theory nonequilibrium thermodynamics. The applied information-constrained maze navigation problem, whereby we study how price information qualitatively alters optimal polices.
منابع مشابه
A unified view of entropy-regularized Markov decision processes
We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization...
متن کاملFinite State Markov Decision Processes with Transfer Entropy Costs
We consider a mathematical framework of finite state Markov Decision Processes (MDPs) in which a weighted sum of the classical state-dependent cost and the transfer entropy from the state random process to the control random process is minimized. Physical interpretations of the considered MDPs are provided in the context of networked control systems theory and non-equilibrium thermodynamics. Ba...
متن کاملClustering Markov Decision Processes For Continual Transfer
We present algorithms to effectively represent a set of Markov decision processes (MDPs), whose optimal policies have already been learned, by a smaller source subset for lifelong, policy-reusebased transfer learning in reinforcement learning. This is necessary when the number of previous tasks is large and the cost of measuring similarity counteracts the benefit of transfer. The source subset ...
متن کاملKnowledge Transfer in Markov Decision Processes
Markov Decision Processes (MDPs) are an effective way to formulate many problems in Machine Learning. However, learning the optimal policy for an MDP can be a time-consuming process, especially when nothing is known about the policy to begin with. An alternative approach is to find a similar MDP, for which an optimal policy is known, and modify this policy as needed. We present a framework for ...
متن کاملTransition Entropy in Partially Observable Markov Decision Processes
This paper proposes a new heuristic algorithm suitable for real-time applications using partially observable Markov decision processes (POMDP). The algorithm is based in a reward shaping strategy which includes entropy information in the reward structure of a fully observable Markov decision process (MDP). This strategy, as illustrated by the presented results, exhibits near-optimal performance...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Automatic Control
سال: 2022
ISSN: ['0018-9286', '1558-2523', '2334-3303']
DOI: https://doi.org/10.1109/tac.2021.3069347